Updating a dataset of labelled objects on raw video sequences with unique object IDs

We present an update to the previously published dataset known as SFU-HW-Objects-v1. The new dataset is called SFU-HW-Tracks-v1 and contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. For each video frame, ground truth annotations include object class ID, object ID, and bounding box location and its dimensions. The dataset can be used to evaluate object tracking performance on uncompressed video sequences and study the relationship between video compression and object tracking, which was not possible using SFU-HW-Objects-v1.


Specifications
Computer Science Specific subject area Computer Vision and Pattern Recognition Type of data Annotations (text files) How the data were acquired The annotated data was obtained by assigning a unique object ID to each object in the existing object detection dataset SFU-HW-Objects-v1. This was done by a semi-automated tracking followed by manual inspection and correction, as described in the article. Data format Analyzed Description of data collection Data was generated by applying correlation tracking to object detection labels in SFU-HW-Objects-v1, followed by manual correction of tracks. This resulted in unique object IDs, identifying the same object in multiple frames, which do not exist in the original dataset.

Value of the Data
• New data include unique object IDs, which identify the same object in multiple frames in the uncompressed HEVC v1 CTC test sequences. • The expanded dataset enables benchmarking of object trackers on uncompressed HEVC v1 CTC test sequences and can be used to study the relationship between video compression and object tracking.

Data Description
We prepared object tracking annotations for 13 high efficiency video coding (HEVC) v1 common test conditions (CTC) video sequences in the YUV420 format [2] , as shown in Table 1 . These sequences are uncompressed and can be acquired from joint collaborative team on video coding (JCT-VC). The new data extend the previously presented SFU-HW-Objects-v1 dataset, and the extended dataset is called SFU-HW-Tracks-v1 . For each video frame, ground truth annotations include object class ID, object ID, and bounding box location and its dimensions. The dataset SFU-HW-Tracks-v1 has separate folders for each class of sequences (B, C, D, E), which differ in resolution, and each class folder contains individual sequence folders. Each sequence folder contains one annotation file per frame, which is a text file and can be viewed in any text editor. Each row in the annotation file corresponds to an object in the corresponding frame, and contains the following information: [Class ID, Object ID, x , y , w , h ]. Class ID represents the identifier of an object class, for example "person," "bicycle," etc. All the class IDs are listed in Table 2 , and they are all part of Common Objects in Context (COCO) object classes [3] . Object ID refers to the unique identity of each object. For example, if a frame contains two persons, unique IDs are provided for person 0 and person 1. Finally, x and y are the horizontal and vertical coordinates of the object's bounding box in relative coordinates (relative to the frame dimensions, as explained below), while w and h are the relative dimensions of the bounding box. The center position of object bounding box in relative coordinates is obtained from the absolute coordinates x * and y * (from the top-left corner), and frame width N and height M, as: Similarly, relative bounding box width and height, w and h , are obtained from the absolute width and height, w * and h * , as: The conversion between relative coordinates and absolute coordinates is also explained in [1] .
The folder and file structure of SFU-HW-Tracks-v1 is illustrated on the example of the Bas-ketballDrive sequence in Fig. 1 . The corresponding annotations can be visualized overlaid on the  image frame using YOLO Mark 1 [4] , as shown in Fig. 2 . In the first frame in this sequence, there are four objects from the "person" class (class ID 0) with object IDs from 0 to 3. There is also a single "sports ball" object (class ID 32) with object ID 0. The combination of class ID and object ID uniquely identifies each annotated object.

Experimental Design, Materials and Methods
Tracking annotations in SFU-HW-Tracks-v1 were created based on object detection annotations in SFU-HW-Objects-v1 [1] , which contain the following information for each object: [Class ID, x , y , w , h ]. However, SFU-HW-Objects-v1 is not suitable for tracking purposes because there is no annotation distinguishing different objects from the same class. Therefore, we further created unique object IDs within each class, which enables distinction of different objects in each class. Further, the same object ID is used for the same object in different frames, which allows computing tracking metrics. These object IDs are included in the second column of the provided annotation files.
We used normalized cross-correlation (NCC) [5] to measure the similarity between two bounding boxes, where each contains an object. To find matching locations for objects in neighboring frames ( n and n + 1 ), we computed NCC for all possible pairs of object bounding boxes between these two frames. For each object bounding box in frame n , we take as its best match the box in frame n + 1 that gives the highest NCC score. If the NCC score is greater than the threshold value (0.6 in most sequences), we copy the corresponding object ID from frame n to